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IDENTIFIERS 
ABSTRACT 

An alternative definition has been developed of the 
delta scale of item difficulty used at Educational Testing Service. 
The traditional delta scale uses an inverse normal transformation 
based on normal ogive models developed years ago. However, no use is 
made of this fact in typical uses of i+em deltas. It is simply one 
way to make the probability scale of item difficulty a more useful 
set of units that are not compressed near 0 or 1. The alternative 
scale uses a different function to achieve the same result Both a 
logistic definition and a normal definition can be calculated. For 
item difficulty between .10 and .90, the difference between the 
standard definition of the delta scale and one based on the logistic 
distribution is negligible. The logistic definition scales very easy 
items as easier than the normal definition. This change in the delta 
scale, as compared with the traditional delta, would have little 
effect on the values of the statistics used. However, it offers the 
advantage of the use of logits. Also, differences in item deltas 
(e.g., for a comparison of two subpopulat ions' performance on an 
item) can be interpreted in terms of odds-ratios of the corresponding 
difficulty values. (GDC) 
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ABSTRACT 

This note develops an alternative definition of the "delta scale" of item 
difficulty that is used at ETS. A comparison is given with the traditional 
definition of the delta scale that is based on the normal distribution. Some 
advantages of the alternative scale are mentioned. 
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1. THE STANDARD DEFINITION OF AN ITEM'S "DELTA* 

Suppose p denotes the proportion of examinees in a given population of 
examinees who answer a particular item correctly. The value, p, is a population 
parameter that measures the easiness of the item, i.e., higher values of p 
denote easier items. At £TS, the difficulty of an item is measured by a trans- 
formation of p to the "delta scale." The transformation of p into A is given by 
the equation 

A(p) = 13 - 4Zp 

where Zp is the usual "z-va?ue" that corresponds to p. That is, the probability 
that a normal deviate is smaller than Zp is p. A(p) may also be expressed as 

A(p) * 13 - ^(p) (1) 
where $~*(p) denote? the inverse function of the normal cummulative distribution 
function, i.e., 

*(x) = J X — i- e4u 2 du. (2) 

The value of A(p) is a population measure of the difficulty of an item 
because higher values of A(p) denote more difficult items. The location and 
scale values of 13 and 4 in (1) are arbitrary, but they ensure that typical 
delta values range from about 5 to about 21. This avoids negative values and 
may have other practical advantages. 

The use of the inverse normal transformation #~ *(p) in (1) is based on 
"normal ogive" types of models for item responses that were developed years ago. 
However, no use of this fact is made in typical uses of "item deltas." We 
regard the use of $~*(p) as simply one way to stretch out the probability scale 
of p into a more useful set of units that are not seriously compressed near p s 0 
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or p-1. The alternative scale given in section 2 uses a different function to 
alter the p-scale in a similar way. 

Estimates of A(p) are used in practice. These are based on samples from 
given populations of examinees. Let p denote a sample proportion of examinees 
(out of n) who give the correct answer on the item in question. The sample 
delta value is 

A * A(p) (3) 
where A(p) is the function defined in (1). 

The standard error of A can be obtained using the 6-method (see Bishop, 
Fienberg, and Holland, 1975). It is given by the asymptotic variance formula, 



Var(A) = 4* 2ir ft 1 "**? exp((*~l(p)) 2 ), (4) 



n 

so that the standard error of A can be estimated by 



s .e 



(A) s 4 / 2ir fo~P? exp(*(rl(p))*). (5) 



2. AN ALTERNATIVE DEFINITION OF A(p) 

Lord and Novick (1968, page 399) report that the normal cumulative $(x) 
and a suitably scaled logistic cumulative differ by no more than .01 for all x. 
For example, if 

T(x) = ex/a+e*), (6) 

then 

|*(x) - *(1.7x)| S .01 all x. 
Hence, we can approximate $(x) by the scaled logistic T(1.7x). This suggests 
approximating *~*(p) by 

Ti"- l M> (7) 
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where 

T-l(p) = ln^). (8) 
and ln(u) denote the natural log of u. Hence, the formula for A(p) in (1) can 
be approximated by 

13 _ * ln(T*-). (9) 
1 . / 1— p 

We may create an alternative definition of A(p) by using (9) au its defini- 
tion rather than U). Soma reasons for doing this will be mentioned in section 
4. 

We wi^l denote by Aj/p) the logistic definition of the A-scale for p, i.e., 

A L (P) = l 3 -i^7 ln (i^)' ( 1Q ) 

or 

A L (P) = 13 - 2.35 ln(^). (11) 
and we will denote the normal definition of A by A^(p), i.e. 

A N (p) = 13 - 4*~l(p). (12) 

3. COMPARATIVE VALUES OF A N AND A L 

The approximation of *(x) by ¥(1.7x) is quite good for all values of x. 
However, when we go to the inverses of these two functions we have no guarantee 
of a similarly good approximation. This needs to be examined directly. Table 1 
and Figure 1 give values of A^(p) - ^(p) for values of p = .01, .02, .99. 
From this we see that for p between .09 and .91 the difference between A^ and A^ 
never exceeds .11. As p approaches 0 and 1 the difference grows more rapidly. 
The difference exceeds 0.50 for p£.97 or p£.03, and at p=.99 or p=.01 it is 
1.51. 
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In rough summary then, for . 10<p<.90 the difference between the standard 
definition of the delta scale and one based on the logistic distribution is 
negligible for practical purposes. For values of p in excess of . 9C the 
logistic definition of A always yields smaller values of A than does the normal 
definition. For values of p smaller than .10 the logistic definition of A 
Always yield values for A that are greater than the normal definition. In many 
practical situations (e.g., multiple choice tests) values of p less than .1 are 
rarely encountered. In these situations the only real difference that one might 
notice between the two definitions of A is that the logistic definition will 
scale very easy items (i.e., p2.95) as easier (i.e., lower A values) than will 
the normal definition of A. 

4. WHY ANOTHER DEFINITION OF THE DELTA SCALE? 

Our purpose is not to argue strongly for a change in the delta scale that 
has been used for a long time at ETS and which is familiar to those who need to 
use it in test construction and analysis. Rather, we wisn to show that if such 
a change were made, it would have little effect on the values of the statistics 
that are used buc would have some advantages that may prove useful. At the very 
least, our analysis shows that useful results that apply to the logistic defini- 
tion of the delta scale nay be translated into results that almost hold for the 
normal definition of this scale. 

Possibly the most important advantage of the A^(p) over A^(p) is that A^(p) 
involves "logits". The logit of p is log(p/(l-p)) . This is a very well studied 
quantity in the statistical (especially biostatistical) literature. For 
example, it is known that a good estimator of Aj/p) is not the obvious Aj/p) but 
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the smoother 



wher6 p s X/n and X is the sample number correct (p is the sample proportion 
correct ) . The estimator in (13) is unbiased tc order 0(n~2) f unlike the more 
obvious estimate, Aj,(p) . The bias of Aj,(p) is 0(n~ A ) so that while A^Cp) and 
froic (13) both converge to the true population value A^(p) as n-*», A^Cp) does so 
at a slower rate that does A^. 

Formula (13) is derived from the Haldane-Anscombe estimator of the logit of 
p — see Bedrick (1984) . 

In addition, the standard error of A^ can be *stimat»d well using the for- 
mula 



,1 * m 4 / X+.l ^ n-X+.l . (14) 

The formula in (14) is derived from the work of Bedrick (1984) on estimators of 
the standard deviation of the Haldane-Anscombc estimate of the logit of p. 
Bedrick shows that the square of (14) provides an unbiased estimate of the 
variance of Aj, to order 0(n~3) . Hence, (13) and (14) provide a rather complete 
package for estimating A^(p) that has good statistical properties, even in rela- 
tively small samples. No such claim can be made for the corresponding formulas 
(12) and (5) to estimate A^(p) . They are only justified in large samples. 

A second virtue of the logistic definition of A is that differences in item 
deltas — say, in a comparison of the performance of two subpopulations of 
examinees on the same item — can be interp\eted in terms of odds-ratios of the 
corresponding p values. For example, suppose p^ is the proportion in group 1 
who got the item correct while p£ is the corresponding proportion in group 2. 



If we form the difference, 

Al(Pi) - A L (p 2 ), 
a bit of algebra reve Is it to equal 



- A m< JL. / 

1 7 i- pi / i- p2 



) (IS) 
which, except for the factor -4/1.7, is the ±og of the odds-ratio 



r: / P2 
1-Pl ' 1-P2 



(16) 



The odds-ratio is also the cross-product racio for the following 2x2 table, 





Right 


Wrong 


Total 


Group 1 


PI 


i-pi 


1 


Group 2 


P2 


l-p 2 


1 











(17) 



i.e., the cross-product ratio is 



Pl(l~P2) 
P2( 1 ~Pl) 



(18) 



The cross-product ratio and its natural log are widely regarded as useful, 
margin-fiee, measures of associations in 2x2 tables. By margin-free we mean 
that if the marginal distributions of the 2x2 table in (17) are modified by 
multiplying each row and column by factors then -the cross-product ratio is 
unchanged. The margin-free nature of the cross-product ratio is quite important 
for test development use of the A-scale since it insures that changes in the 
overall correct answer rate of an item for a population will have a minimal 
effect on the comparison of item deltas for subgroups within the population. 
For example, differences in deltas found in one test administration will tend to 
hold up in other test administrations. Hence, the use of A^(p) rather than 
Afl(p) brings the comparison of item difficulty indices into line with a well- 
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established statistical theory of dependence in 2x2 tables, e.g., Bishop, 
Fienberg, and Holland (1975, chapter 11). 
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FIGURE 1. PLOT OF A N (p) - A L (p) VERSUS p. 
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TABLE 1. VALUES OF p and A N (p) - A L (p) for p*.01 (.01) 



T) 

r 


Awrf d1 — At f 


.01 


-1.51 


.02 


-.94 


.03 


-.66 


.04 


-.48 


.05 


-.35 


.06 


-.26 


.07 


.18 


.08 


-.13 


.09 


-.08 


.10 


-.04 


.11 


-.01 


.12 


.01 


.13 


.03 


.14 


.05 


.15 


.06 


.16 


.08 


.17 


.09 


.18 


.09 


,19 


.10 


.20 


.10 


.21 


.11 


.22 


.11 


.23 


.11 


.24 


.11 


.25 


.11 


.26 


.11 


.27 


.11 


.28 


.11 


.29 


.11 


.30 


.10 


.31 


.10 


.32 


.10 


.33 


.09 


.34 


.09 


.35 


.08 


.36 


.08 


.37 


.08 


.38 


.07 


.39 


.06 


.40 


.06 


.41 


.05 


.42 


.05 


.43 


.04 


.44 


.04 


.45 


.03 


.46 


.02 


.47 


.02 


.48 


.01 


.49 


.01 


.50 


.00 



p 


Am(p)~At (p) 


.51 


-.01 


.52 


-.01 


.53 


-.02 


.54 


-.02 


.55 


-.03 


.56 


-.04 


.57 


-.04 


.58 


-.05 


.59 


-.05 


.60 


-.06 


.61 


-.06 


.62 


-.07 


.63 


-.08 


.64 


-.08 


.65 


-.08 


.66 


-.09 


.67 


-.09 


.68 


-.10 


.69 


-.10 


.70 


-.10 


.71 




.72 


-!ll 


.73 


-!ll 


.74 


-!n 


.75 


-.11 


.76 


-!n 


.77 


-!n 


.78 


-!n 


.79 


-!n 


.80 


-.10 


.81 


-.10 


.82 


-.09 


.&j 


-.09 


.84 


-.08 


.85 


-.06 


.86 


-.05 


.87 


-.03 


.88 


-.01 


.89 


.01 


.90 


.04 


.91 


.08 


.92 


.13 


.93 


.18 


.94 


.26 


.95 


.35 


.96 


.48 


.97 


.66 


.98 


.94 


.99 


1.51 
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